DAFx Paper Archive - Browse all papers by Erkut, C.

Methods for modeling realistic playing in plucked-string synthesis: analysis, control and synthesis

DAFx-2000 - Verona

Recent Advances in Physical Modeling with K- and W-Techniques

Matti Karjalainen; Jyri Pakarinen; Cumhur Erkut; Paulo A. A. Esquef; Vesa Välimäki

DAFx-2004 - Naples

Physical (or physics-based) modeling of musical instruments is one of the main research fields in computer music. A basic question, with increasing research interest recently, is to understand how different discrete-time modeling paradigms are interrelated and can be combined, whereby wave modeling with wave quantities (W-methods) and Kirchhoff quantities (K-methods) can be understood in the same theoretical framework. This paper presents recent results from the HUT Sound Source Modeling group, both in the form of theoretical discussions and by examples of Kvs. W-modeling in sound synthesis of musical instruments.

Download

Inferring the hand configuration from hand clapping sounds

Antti Jylhä; Cumhur Erkut

DAFx-2008 - Espoo

In this paper, a technique for inferring the configuration of a clapper’s hands from a hand clapping sound is described. The method was developed based on analysis of synthetic and recorded hand clap sounds, labeled with the corresponding hand configurations. A naïve Bayes classifier was constructed to automatically classify the data using two different feature sets. The results indicate that the approach is applicable for inferring the hand configuration.

Download

Differentiable All-Pass Filters for Phase Response Estimation and Automatic Signal Alignment

Anders Bargum; Stefania Serafin; Cumhur Erkut; Julian Parker

DAFx-2023 - Copenhagen

Virtual analog (VA) audio effects are increasingly based on neural networks and deep learning frameworks. Due to the underlying black-box methodology, a successful model will learn to approximate the data it is presented, including potential errors such as latency and audio dropouts as well as non-linear characteristics and frequency-dependent phase shifts produced by the hardware. The latter is of particular interest as the learned phase-response might cause unwanted audible artifacts when the effect is used for creative processing techniques such as dry-wet mixing or parallel compression. To overcome these artifacts we propose differentiable signal processing tools and deep optimization structures for automatically tuning all-pass filters to predict the phase response of different VA simulations, and align processed signals that are out of phase. The approaches are assessed using objective metrics while listening tests evaluate their ability to enhance the quality of parallel path processing techniques. Ultimately, an overparameterized, BiasNet-based, all-pass model is proposed for the optimization problem under consideration, resulting in models that can estimate all-pass filter coefficients to align a dry signal with its affected, wet, equivalent.

Download

Vocal Timbre Effects with Differentiable Digital Signal Processing

David Südholt; Cumhur Erkut

DAFx-2023 - Copenhagen

We explore two approaches to creatively altering vocal timbre using Differentiable Digital Signal Processing (DDSP). The first approach is inspired by classic cross-synthesis techniques. A pretrained DDSP decoder predicts a filter for a noise source and a harmonic distribution, based on pitch and loudness information extracted from the vocal input. Before synthesis, the harmonic distribution is modified by interpolating between the predicted distribution and the harmonics of the input. We provide a real-time implementation of this approach in the form of a Neutone model. In the second approach, autoencoder models are trained on datasets consisting of both vocal and instrument training data. To apply the effect, the trained autoencoder attempts to reconstruct the vocal input. We find that there is a desirable “sweet spot” during training, where the model has learned to reconstruct the phonetic content of the input vocals, but is still affected by the timbre of the instrument mixed into the training data. After further training, that effect disappears. A perceptual evaluation compares the two approaches. We find that the autoencoder in the second approach is able to reconstruct intelligible lyrical content without any explicit phonetic information provided during training.

Download

Years

Authors